-
Notifications
You must be signed in to change notification settings - Fork 0
Automated Test: embed-url-handling-post #319
Conversation
📝 WalkthroughWalkthroughThis pull request introduces embedding functionality, allowing external websites to embed Discourse discussions via iframe, alongside RSS/ATOM feed polling capabilities to automatically create topics from remote feed sources. Changes
Sequence Diagram(s)sequenceDiagram
participant Browser
participant ExternalSite
participant DiscourseServer
participant Redis
participant DB as Database
ExternalSite->>Browser: Load page with embed.js
Browser->>DiscourseServer: Create iframe to /embed/best?embed_url=...
DiscourseServer->>DiscourseServer: ensure_embeddable validates referer
DiscourseServer->>DB: Check TopicEmbed for embed_url
alt Topic exists
DB-->>DiscourseServer: topic_id found
DiscourseServer->>DB: Fetch TopicView (best posts)
DB-->>DiscourseServer: posts data
else Topic doesn't exist
DB-->>DiscourseServer: no topic_id
DiscourseServer->>Redis: Check throttle for embed_url
alt Not throttled
Redis-->>DiscourseServer: proceed
DiscourseServer->>DiscourseServer: Queue RetrieveTopic job
else Recently retrieved
Redis-->>DiscourseServer: throttled
end
DiscourseServer->>Browser: Render loading view
Browser->>Browser: Auto-reload after 30s
end
DiscourseServer->>Browser: Render embedded comments
Browser->>Browser: postMessage listener ready
Browser-->>ExternalSite: Set iframe height
sequenceDiagram
participant Scheduler
participant PollFeed as PollFeed Job
participant SimpleRSS
participant RemoteFeed as Remote Feed
participant TopicEmbed
participant DB as Database
Scheduler->>PollFeed: Execute (hourly)
PollFeed->>PollFeed: Validate settings (enabled, url, username)
alt Settings incomplete
PollFeed->>PollFeed: Return early
else All settings present
PollFeed->>DB: Resolve embed_by_username to User
PollFeed->>RemoteFeed: Fetch feed URL
RemoteFeed-->>SimpleRSS: Feed XML/ATOM
SimpleRSS-->>PollFeed: Parsed feed items
loop For each feed item
PollFeed->>PollFeed: Extract URL and content
PollFeed->>PollFeed: Unescape HTML
PollFeed->>TopicEmbed: import(user, url, title, content)
TopicEmbed->>DB: Create or update topic/post
DB-->>TopicEmbed: Result
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In `@app/assets/javascripts/embed.js`:
- Around line 5-12: The code assumes the DOM element with id
'discourse-comments' exists and calls comments.appendChild(iframe), which will
throw if comments is null; update the logic where comments is obtained (variable
comments referencing document.getElementById('discourse-comments')) to guard
against null by checking if comments is truthy before creating/appending the
iframe (or return/exit early), so only create iframe and set
iframe.src/attributes and call comments.appendChild(iframe) when comments is
present.
- Around line 15-24: postMessageReceived currently uses
discourseUrl.indexOf(e.origin) which is vulnerable to host-substring bypasses;
change the origin check to compare origins exactly by using the URL origin
comparison (e.g. compute new URL(discourseUrl).origin and compare it ===
e.origin) or maintain a whitelist array of allowed origins and check e.origin
against that list with strict equality; update the validation in
postMessageReceived to use e.origin === allowedOrigin (or
allowedOrigins.includes(e.origin)) and leave the rest of the message handling
(e.data.type === 'discourse-resize' and iframe.height assignment) unchanged.
In `@app/jobs/scheduled/poll_feed.rb`:
- Around line 24-38: The poll_feed method lacks network/feed error handling and
calls i.content.scrub without nil checks; wrap the feed fetch and parse
(open(SiteSetting.feed_polling_url) and SimpleRSS.parse) in a begin/rescue to
catch and log exceptions, and skip processing on failure, and before calling
.scrub on i.content ensure i.content is present (e.g., use a safe fallback like
an empty string or i.description) so TopicEmbed.import is never called with nil
content; log any errors with context (feed URL and exception) and continue
processing remaining items.
In `@app/models/topic_embed.rb`:
- Line 13: The HTML injection risk comes from interpolating the raw url into the
I18n string in topic_embed.rb; change the construction so the URL is
HTML-escaped before embedding (e.g., use ERB::Util.html_escape or CGI.escapeHTML
on the url) and ensure the href uses a validated/sanitized http(s) URI (parse
and reject non-http(s) schemes) before escaping; update the line that builds
contents (the interpolation of url inside I18n.t('embed.imported_from', link:
...)) to use the escaped/sanitized value so the rendered <a href='...'> and link
text cannot break out or inject XSS.
- Around line 44-53: The import_remote method currently calls open(url) and
Readability::Document without validation or error handling; replace the direct
open call in TopicEmbed.import_remote with a safe fetch: validate and parse the
URL (only allow http/https), resolve the host and reject private/internal IPs,
enforce a max redirect count, use a HTTP client (e.g., Net::HTTP) with explicit
open/read timeouts and a maximum allowed content-length, and limit the HTML size
read; wrap the fetch and Readability::Document creation in a rescue block to
handle timeouts, network errors, invalid content, and oversized responses and
return or raise a controlled error instead of crashing, keeping the call site
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) unchanged.
In `@app/views/embed/best.html.erb`:
- Around line 3-5: The two link_to calls that open in a new tab (the ones
calling I18n.t('embed.title') and I18n.t('embed.start_discussion') with target:
'_blank' and using `@topic_view.topic.url`) need a rel attribute to prevent
reverse-tabnabbing; update each link_to invocation to include rel: 'noopener
noreferrer' alongside the existing class and target options.
- Around line 16-17: The avatar <img> lacks alt text; update the tag that uses
post.user.small_avatar_url to include an alt attribute (e.g., using
post.user.username or a fallback like "#{post.user.username} avatar") so screen
readers receive a meaningful description; ensure you use the same
post.user.username symbol for the alt and handle nil/blank username with a
sensible default.
In `@app/views/layouts/embed.html.erb`:
- Around line 9-12: The postMessage call currently uses request.referer as the
targetOrigin which is unreliable and unsafe; update the embed HTML so
parent.postMessage uses a validated origin instead—prefer reading
SiteSetting.embeddable_host (or a derived safeOrigin constant) and fall back to
a safe default only if explicitly configured; ensure you validate/normalize the
origin before passing it to parent.postMessage and avoid using raw
request.referer directly in the targetOrigin parameter (refer to the
parent.postMessage usage and request.referer in embed.html.erb).
In `@config/routes.rb`:
- Around line 245-246: The project is using Rails 3.2.12 (Gemfile entry for
rails) which is vulnerable to CVE-2014-0130; update the Rails dependency in the
Gemfile to a secure version (preferably upgrade the app to Rails 4.x as the
codebase already supports it) then run bundle update to refresh gems, run the
test suite and fix any deprecations/migration issues (routes like the new get
'embed/best' => 'embed#best' may need no change but verify routing and globbing
behavior), and ensure all security-related tests/passive scans are re-run before
merging.
In `@lib/post_revisor.rb`:
- Around line 85-86: In update_post, capture the boolean result of
`@post.save`(validate: !@opts[:skip_validations]) and handle a failure the same
way PostCreator and TopicCreator do: if save returns false (or exception occurs)
raise a runtime error (or log and raise) to prevent subsequent
save_reply_relationships from running and creating inconsistent state; reference
update_post, `@post.save`(validate: !@opts[:skip_validations]), and
save_reply_relationships when making the change so the save outcome is checked
and failures are surfaced consistently.
In `@lib/topic_retriever.rb`:
- Around line 48-50: The fetch_http method calls
SiteSetting.embed_by_username.downcase which will raise if the setting is nil;
update fetch_http to guard the setting before calling downcase (e.g., check
SiteSetting.embed_by_username.blank? or use to_s/safe navigation) and return
early if unset, then proceed to lookup User.where(username_lower: ...) so that
User lookup only runs when embed_by_username is present; reference fetch_http,
SiteSetting.embed_by_username, and User.where to locate the change.
- Around line 27-30: The current two-step throttle uses
$redis.setnx(retrieved_key, "1") followed by $redis.expire(retrieved_key, 60),
which can leave the key permanent on a crash; replace this with a single atomic
Redis SET that uses NX and EX options (i.e., use the Redis client's SET with NX
and a 60s TTL) and remove the separate expire call so the key is created only
when absent and always has the expiration.
In `@spec/jobs/poll_feed_spec.rb`:
- Around line 18-27: Make the specs deterministic by stubbing
SiteSetting.feed_polling_enabled? to true in each example and ensuring only the
intended setting is nil; for the "requires feed_polling_url" test stub
SiteSetting.feed_polling_enabled? => true and stub SiteSetting.embed_by_username
to a valid non-nil value before calling poller.execute, and for the "requires
embed_by_username" test stub SiteSetting.feed_polling_enabled? => true and stub
SiteSetting.feed_polling_url to a valid non-nil value so poller.execute only
fails due to the specifically tested missing setting (references:
poller.execute, poller.expects(:poll_feed).never,
SiteSetting.stubs(:feed_polling_url), SiteSetting.stubs(:embed_by_username),
SiteSetting.stubs(:feed_polling_enabled?)).
🧹 Nitpick comments (10)
app/assets/stylesheets/embed.css.scss (1)
1-69: LGTM with minor suggestion: consider using variables for colors and dimensions.The stylesheet is well-structured. For maintainability, consider extracting repeated values into SCSS variables:
// At top of file $embed-border-color: `#ddd`; $embed-muted-text: `#aaa`; $embed-author-color: `#4a6b82`; $embed-author-width: 92px;This is optional and can be deferred if consistent with existing codebase patterns.
db/migrate/20131223171005_create_top_topics.rb (1)
3-3: Avoidforce: truein table creation migrations.Using
force: trueincreate_tablewill drop and recreate the table if it already exists. While this pattern was common during Discourse's early development (2013), it's not a Rails best practice and creates unnecessary fragility. In production, migrations only run once and are tracked, so the data loss risk is lower than it might appear, but the pattern should still be avoided in favor of explicit error handling or separate drop/create migrations with appropriate safeguards.This pattern appears in 11 migrations across the codebase from 2012–2013 and should be addressed as part of a broader migration cleanup effort.
Gemfile (1)
209-211: Pin gem versions to ensure reproducibility and compatibility.The new gems lack version constraints. Without pinning, future
bundle updateruns could introduce breaking changes. Specify compatible versions:# required for feed importing and embedding -gem 'ruby-readability', require: false -gem 'simple-rss', require: false +gem 'ruby-readability', '~> 0.7', require: false +gem 'simple-rss', '~> 1.3', require: falseNote:
simple-rsshasn't been updated since April 2018—consider verifying it meets your needs before locking to this version.db/migrate/20131217174004_create_topic_embeds.rb (1)
3-11: Consider adding indexes on topic_id and post_id for query performance.The table lacks indexes on
topic_idandpost_id, which may be used for lookups when retrieving embeds by topic or post. Additionally, consider adding foreign key constraints for referential integrity.♻️ Proposed improvement
add_index :topic_embeds, :embed_url, unique: true + add_index :topic_embeds, :topic_id + add_index :topic_embeds, :post_id end endapp/jobs/regular/retrieve_topic.rb (2)
1-2: Remove unusedrequire_dependency.The
email/senderdependency appears unused in this job. Onlytopic_retrieveris needed.♻️ Proposed fix
-require_dependency 'email/sender' require_dependency 'topic_retriever'
4-22: Minor formatting: Extra empty lines at module boundaries.Static analysis flagged extra empty lines at lines 5-6 and 21-22. This is a minor convention issue per RuboCop's
Layout/EmptyLinesAroundModuleBodyrule.♻️ Proposed fix
module Jobs - # Asynchronously retrieve a topic from an embedded site class RetrieveTopic < Jobs::Base # ... (no changes to execute method) end - endapp/models/topic_embed.rb (1)
56-76: Port handling may produce incorrect URLs for standard ports.Line 59 appends the port when it's neither 80 nor 443, but this doesn't account for the scheme. For example, an HTTPS URL on port 80 (non-standard) should include the port, but currently wouldn't. Also, the
aandimgprocessing blocks are duplicated.♻️ Proposed refactor to fix port logic and reduce duplication
def self.absolutize_urls(url, contents) uri = URI(url) prefix = "#{uri.scheme}://#{uri.host}" - prefix << ":#{uri.port}" if uri.port != 80 && uri.port != 443 + # Only include port if it differs from the scheme's default + default_port = uri.scheme == 'https' ? 443 : 80 + prefix << ":#{uri.port}" if uri.port != default_port fragment = Nokogiri::HTML.fragment(contents) - fragment.css('a').each do |a| - href = a['href'] - if href.present? && href.start_with?('/') - a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}" - end - end - fragment.css('img').each do |a| - src = a['src'] - if src.present? && src.start_with?('/') - a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}" + %w[a img].each do |tag| + attr = tag == 'a' ? 'href' : 'src' + fragment.css(tag).each do |el| + value = el[attr] + if value.present? && value.start_with?('/') + el[attr] = "#{prefix}/#{value.sub(/^\/+/, '')}" + end end end fragment.to_html endapp/jobs/scheduled/poll_feed.rb (1)
20-22: Unused method:feed_keyis defined but never called.This method generates a cache key but isn't used anywhere in the current implementation. Consider removing it if caching isn't planned, or implement the intended caching logic.
lib/topic_retriever.rb (1)
40-43: Synchronous feed polling may cause request timeouts.Calling
Jobs::PollFeed.new.execute({})synchronously in the request path could significantly delay the response while the entire feed is fetched and processed. Consider checking if this is truly necessary here or if the feed polling should remain solely as a scheduled background job.app/views/embed/loading.html.erb (1)
6-11: No reload limit: page will refresh indefinitely until content loads.The script reloads the page every 30 seconds with no limit. If the content never becomes available (e.g., fetch failure, invalid URL), users will experience an infinite reload loop. Consider adding a maximum retry count or displaying an error after several attempts.
♻️ Proposed fix with retry limit
<script> (function() { + var maxRetries = 10; + var retryCount = parseInt(sessionStorage.getItem('embedRetryCount') || '0'); + + if (retryCount >= maxRetries) { + document.querySelector('header').innerHTML = '<%= t("embed.failed_to_load") %>'; + sessionStorage.removeItem('embedRetryCount'); + return; + } + + sessionStorage.setItem('embedRetryCount', retryCount + 1); setTimeout(function() { document.location.reload(); }, 30000); })(); </script>
| var comments = document.getElementById('discourse-comments'), | ||
| iframe = document.createElement('iframe'); | ||
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | ||
| iframe.id = 'discourse-embed-frame'; | ||
| iframe.width = "100%"; | ||
| iframe.frameBorder = "0"; | ||
| iframe.scrolling = "no"; | ||
| comments.appendChild(iframe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null check for the target DOM element.
If the discourse-comments element doesn't exist on the page, comments.appendChild(iframe) will throw an error. Consider adding defensive validation.
🛡️ Proposed fix
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
+if (!comments) { return; }
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var comments = document.getElementById('discourse-comments'), | |
| iframe = document.createElement('iframe'); | |
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | |
| iframe.id = 'discourse-embed-frame'; | |
| iframe.width = "100%"; | |
| iframe.frameBorder = "0"; | |
| iframe.scrolling = "no"; | |
| comments.appendChild(iframe); | |
| var comments = document.getElementById('discourse-comments'), | |
| iframe = document.createElement('iframe'); | |
| if (!comments) { return; } | |
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | |
| iframe.id = 'discourse-embed-frame'; | |
| iframe.width = "100%"; | |
| iframe.frameBorder = "0"; | |
| iframe.scrolling = "no"; | |
| comments.appendChild(iframe); |
🤖 Prompt for AI Agents
In `@app/assets/javascripts/embed.js` around lines 5 - 12, The code assumes the
DOM element with id 'discourse-comments' exists and calls
comments.appendChild(iframe), which will throw if comments is null; update the
logic where comments is obtained (variable comments referencing
document.getElementById('discourse-comments')) to guard against null by checking
if comments is truthy before creating/appending the iframe (or return/exit
early), so only create iframe and set iframe.src/attributes and call
comments.appendChild(iframe) when comments is present.
| function postMessageReceived(e) { | ||
| if (!e) { return; } | ||
| if (discourseUrl.indexOf(e.origin) === -1) { return; } | ||
|
|
||
| if (e.data) { | ||
| if (e.data.type === 'discourse-resize' && e.data.height) { | ||
| iframe.height = e.data.height + "px"; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Origin validation is vulnerable to bypass attacks.
The indexOf check on line 17 can be bypassed by malicious origins. For example, if discourseUrl is "https://discourse.example.com", an attacker hosting content at "https://discourse.example.com.evil.com" would pass this check since indexOf would return 0.
🔒 Proposed fix for stricter origin validation
function postMessageReceived(e) {
if (!e) { return; }
- if (discourseUrl.indexOf(e.origin) === -1) { return; }
+ // Extract origin from discourseUrl for strict comparison
+ var defined = document.createElement('a');
+ defined.href = discourseUrl;
+ var expectedOrigin = defined.protocol + '//' + defined.host;
+ if (e.origin !== expectedOrigin) { return; }
if (e.data) {🤖 Prompt for AI Agents
In `@app/assets/javascripts/embed.js` around lines 15 - 24, postMessageReceived
currently uses discourseUrl.indexOf(e.origin) which is vulnerable to
host-substring bypasses; change the origin check to compare origins exactly by
using the URL origin comparison (e.g. compute new URL(discourseUrl).origin and
compare it === e.origin) or maintain a whitelist array of allowed origins and
check e.origin against that list with strict equality; update the validation in
postMessageReceived to use e.origin === allowedOrigin (or
allowedOrigins.includes(e.origin)) and leave the rest of the message handling
(e.data.type === 'discourse-resize' and iframe.height assignment) unchanged.
| def poll_feed | ||
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first | ||
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) | ||
|
|
||
| rss.items.each do |i| | ||
| url = i.link | ||
| url = i.id if url.blank? || url !~ /^https?\:\/\// | ||
|
|
||
| content = CGI.unescapeHTML(i.content.scrub) | ||
| TopicEmbed.import(user, url, i.title, content) | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling and potential NoMethodError on nil content.
The poll_feed method lacks error handling for network failures and malformed feeds. Additionally, line 35 calls .scrub on i.content which will raise NoMethodError if the feed item has no content field.
🐛 Proposed fix with error handling and nil safety
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
- require 'simple-rss'
- rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
-
- rss.items.each do |i|
- url = i.link
- url = i.id if url.blank? || url !~ /^https?\:\/\//
-
- content = CGI.unescapeHTML(i.content.scrub)
- TopicEmbed.import(user, url, i.title, content)
+ begin
+ require 'simple-rss'
+ feed_response = URI.open(SiteSetting.feed_polling_url, read_timeout: 30, open_timeout: 10)
+ rss = SimpleRSS.parse(feed_response)
+
+ rss.items.each do |i|
+ url = i.link
+ url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+ raw_content = i.content || i.description || ''
+ content = CGI.unescapeHTML(raw_content.to_s.scrub)
+ TopicEmbed.import(user, url, i.title, content)
+ end
+ rescue StandardError => e
+ Rails.logger.error("PollFeed failed: #{e.message}")
end
end🧰 Tools
🪛 Brakeman (8.0.1)
[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/
(File Access)
🤖 Prompt for AI Agents
In `@app/jobs/scheduled/poll_feed.rb` around lines 24 - 38, The poll_feed method
lacks network/feed error handling and calls i.content.scrub without nil checks;
wrap the feed fetch and parse (open(SiteSetting.feed_polling_url) and
SimpleRSS.parse) in a begin/rescue to catch and log exceptions, and skip
processing on failure, and before calling .scrub on i.content ensure i.content
is present (e.g., use a safe fallback like an empty string or i.description) so
TopicEmbed.import is never called with nil content; log any errors with context
(feed URL and exception) and continue processing remaining items.
| def self.import(user, url, title, contents) | ||
| return unless url =~ /^https?\:\/\// | ||
|
|
||
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential XSS: URL is interpolated into HTML without escaping.
The url parameter is directly embedded into the HTML string. While validated against an HTTP(S) regex, URLs can still contain characters that break out of the href attribute or introduce XSS payloads (e.g., javascript: protocol after redirects, or characters like " and >).
🛡️ Proposed fix using ERB::Util for escaping
+ require 'erb'
+
# Import an article from a source (RSS/Atom/Other)
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//
- contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+ escaped_url = ERB::Util.html_escape(url)
+ contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n" | |
| require 'erb' | |
| # Import an article from a source (RSS/Atom/Other) | |
| def self.import(user, url, title, contents) | |
| return unless url =~ /^https?\:\/\// | |
| escaped_url = ERB::Util.html_escape(url) | |
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n" | |
| end |
🤖 Prompt for AI Agents
In `@app/models/topic_embed.rb` at line 13, The HTML injection risk comes from
interpolating the raw url into the I18n string in topic_embed.rb; change the
construction so the URL is HTML-escaped before embedding (e.g., use
ERB::Util.html_escape or CGI.escapeHTML on the url) and ensure the href uses a
validated/sanitized http(s) URI (parse and reject non-http(s) schemes) before
escaping; update the line that builds contents (the interpolation of url inside
I18n.t('embed.imported_from', link: ...)) to use the escaped/sanitized value so
the rendered <a href='...'> and link text cannot break out or inject XSS.
| def self.import_remote(user, url, opts=nil) | ||
| require 'ruby-readability' | ||
|
|
||
| opts = opts || {} | ||
| doc = Readability::Document.new(open(url).read, | ||
| tags: %w[div p code pre h1 h2 h3 b em i strong a img], | ||
| attributes: %w[href src]) | ||
|
|
||
| TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: SSRF vulnerability and missing error handling in import_remote.
Using open(url) directly with user-controlled URLs exposes the application to Server-Side Request Forgery (SSRF) attacks. An attacker could potentially access internal network resources. Additionally, there's no timeout, error handling, or content-length limit which could lead to resource exhaustion.
🔒 Proposed fix with validation and error handling
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'
opts = opts || {}
+
+ begin
+ uri = URI.parse(url)
+ raise "Invalid URL scheme" unless %w[http https].include?(uri.scheme)
+
+ # Add timeout and redirect limits
+ response = uri.open(read_timeout: 30, open_timeout: 10)
+ html_content = response.read(1.megabyte) # Limit content size
+ rescue StandardError => e
+ Rails.logger.error("Failed to fetch embed URL #{url}: #{e.message}")
+ return nil
+ end
+
- doc = Readability::Document.new(open(url).read,
+ doc = Readability::Document.new(html_content,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end🤖 Prompt for AI Agents
In `@app/models/topic_embed.rb` around lines 44 - 53, The import_remote method
currently calls open(url) and Readability::Document without validation or error
handling; replace the direct open call in TopicEmbed.import_remote with a safe
fetch: validate and parse the URL (only allow http/https), resolve the host and
reject private/internal IPs, enforce a max redirect count, use a HTTP client
(e.g., Net::HTTP) with explicit open/read timeouts and a maximum allowed
content-length, and limit the HTML size read; wrap the fetch and
Readability::Document creation in a rescue block to handle timeouts, network
errors, invalid content, and oversized responses and return or raise a
controlled error instead of crashing, keeping the call site
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) unchanged.
| get 'embed/best' => 'embed#best' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Route addition looks good, but Rails version has critical vulnerability.
The new embed route follows existing conventions. However, static analysis flagged that Rails 3.2.12 (in Gemfile line 59) is vulnerable to CVE-2014-0130—a directory traversal and remote code execution vulnerability affecting globbing routes. This is a critical security issue that should be prioritized.
🔒 Recommended fix: Upgrade Rails
In the Gemfile (Rails 3.x branch):
- gem 'rails', '3.2.12'
+ gem 'rails', '~> 3.2.22'Or better yet, migrate to Rails 4.x which is already supported in this codebase.
🧰 Tools
🪛 Brakeman (8.0.1)
[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf
(Remote Code Execution)
🤖 Prompt for AI Agents
In `@config/routes.rb` around lines 245 - 246, The project is using Rails 3.2.12
(Gemfile entry for rails) which is vulnerable to CVE-2014-0130; update the Rails
dependency in the Gemfile to a secure version (preferably upgrade the app to
Rails 4.x as the codebase already supports it) then run bundle update to refresh
gems, run the test suite and fix any deprecations/migration issues (routes like
the new get 'embed/best' => 'embed#best' may need no change but verify routing
and globbing behavior), and ensure all security-related tests/passive scans are
re-run before merging.
| @post.save(validate: !@opts[:skip_validations]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code at lines 85-86 and surrounding context
cd / && find . -name "post_revisor.rb" -type f 2>/dev/null | head -5Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 127
🏁 Script executed:
# Check the file structure to find the correct path
git ls-files | grep -i post_revisorRepository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 144
🏁 Script executed:
# View the code around lines 85-86
cat -n lib/post_revisor.rb | sed -n '75,100p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 1077
🏁 Script executed:
# See the method name containing this code
cat -n lib/post_revisor.rb | sed -n '60,90p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 1299
🏁 Script executed:
# Find all uses of skip_validations in the codebase
rg -n 'skip_validations' --type rubyRepository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 1541
🏁 Script executed:
# Check the save_reply_relationships method implementation
rg -A 10 'def save_reply_relationships' lib/post_revisor.rbRepository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 70
🏁 Script executed:
# Find save_reply_relationships - likely in Post model
rg -B 2 -A 15 'def save_reply_relationships' --type rubyRepository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 953
🏁 Script executed:
# Look at how PostCreator handles the save return value
cat -n lib/post_creator.rb | sed -n '225,240p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 525
🏁 Script executed:
# Look at how TopicCreator handles the save return value
cat -n lib/topic_creator.rb | sed -n '70,85p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 591
🏁 Script executed:
# Check the revise! method in post_revisor to see if it handles errors
rg -B 5 -A 30 'def revise!' lib/post_revisor.rb | head -60Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 700
🏁 Script executed:
# Check if revise! method does any error handling or transaction management
cat -n lib/post_revisor.rb | sed -n '1,50p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 1416
🏁 Script executed:
# Check the context where skip_validations is used with revise!
cat -n app/models/topic_embed.rb | sed -n '30,45p'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 647
🏁 Script executed:
# Check if there's any transaction/error handling around revise! calls
rg -B 5 -A 5 'revise!' lib/ app/ --type ruby | grep -A 10 -B 10 'skip_validations'Repository: admin-coderabbit/coderabbit-discourse-mirror
Length of output: 1055
Handle failed saves when skip_validations is enabled.
In the update_post method, @post.save(validate: !@opts[:skip_validations]) doesn't capture the return value, so if save fails and skip_validations is true, the failure goes unnoticed. Line 87's save_reply_relationships still executes, potentially creating inconsistent state. This is inconsistent with how PostCreator (line 232) and TopicCreator (line 78) handle the same pattern—both check the return value and raise exceptions on failure.
Consider either raising an exception (consistent with PostCreator/TopicCreator pattern) or logging the failure:
Suggested fix
- `@post.save`(validate: !@opts[:skip_validations])
-
+ unless `@post.save`(validate: !@opts[:skip_validations])
+ Rails.logger.warn("PostRevisor: Failed to save post #{`@post.id`}: #{`@post.errors.full_messages.join`(', ')}")
+ end
`@post.save_reply_relationships`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @post.save(validate: !@opts[:skip_validations]) | |
| unless `@post.save`(validate: !@opts[:skip_validations]) | |
| Rails.logger.warn("PostRevisor: Failed to save post #{`@post.id`}: #{`@post.errors.full_messages.join`(', ')}") | |
| end |
🤖 Prompt for AI Agents
In `@lib/post_revisor.rb` around lines 85 - 86, In update_post, capture the
boolean result of `@post.save`(validate: !@opts[:skip_validations]) and handle a
failure the same way PostCreator and TopicCreator do: if save returns false (or
exception occurs) raise a runtime error (or log and raise) to prevent subsequent
save_reply_relationships from running and creating inconsistent state; reference
update_post, `@post.save`(validate: !@opts[:skip_validations]), and
save_reply_relationships when making the change so the save outcome is checked
and failures are surfaced consistently.
| if $redis.setnx(retrieved_key, "1") | ||
| $redis.expire(retrieved_key, 60) | ||
| return false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition in Redis throttling: key may persist indefinitely.
If the process crashes between setnx (line 27) and expire (line 28), the key will never expire and will block all future retrievals for that URL. Use SET with EX and NX options atomically instead.
🐛 Proposed fix using atomic SET with EX and NX
def retrieved_recently?
# We can disable the throttle for some users, such as staff
return false if `@opts`[:no_throttle]
# Throttle other users to once every 60 seconds
retrieved_key = "retrieved:#{`@embed_url`}"
- if $redis.setnx(retrieved_key, "1")
- $redis.expire(retrieved_key, 60)
- return false
- end
-
- true
+ # SET with NX and EX is atomic - prevents key from persisting if process crashes
+ !$redis.set(retrieved_key, "1", ex: 60, nx: true)
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if $redis.setnx(retrieved_key, "1") | |
| $redis.expire(retrieved_key, 60) | |
| return false | |
| end | |
| def retrieved_recently? | |
| # We can disable the throttle for some users, such as staff | |
| return false if `@opts`[:no_throttle] | |
| # Throttle other users to once every 60 seconds | |
| retrieved_key = "retrieved:#{`@embed_url`}" | |
| # SET with NX and EX is atomic - prevents key from persisting if process crashes | |
| !$redis.set(retrieved_key, "1", ex: 60, nx: true) | |
| end |
🤖 Prompt for AI Agents
In `@lib/topic_retriever.rb` around lines 27 - 30, The current two-step throttle
uses $redis.setnx(retrieved_key, "1") followed by $redis.expire(retrieved_key,
60), which can leave the key permanent on a crash; replace this with a single
atomic Redis SET that uses NX and EX options (i.e., use the Redis client's SET
with NX and a 60s TTL) and remove the separate expire call so the key is created
only when absent and always has the expiration.
| def fetch_http | ||
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first | ||
| return if user.blank? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential NoMethodError if embed_by_username is nil.
Line 49 calls .downcase on SiteSetting.embed_by_username without a nil check. If the setting is unconfigured, this will raise NoMethodError.
🛡️ Proposed fix with safe navigation
def fetch_http
- user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
+ username = SiteSetting.embed_by_username
+ return if username.blank?
+
+ user = User.where(username_lower: username.downcase).first
return if user.blank?
TopicEmbed.import_remote(user, `@embed_url`)
end🤖 Prompt for AI Agents
In `@lib/topic_retriever.rb` around lines 48 - 50, The fetch_http method calls
SiteSetting.embed_by_username.downcase which will raise if the setting is nil;
update fetch_http to guard the setting before calling downcase (e.g., check
SiteSetting.embed_by_username.blank? or use to_s/safe navigation) and return
early if unset, then proceed to lookup User.where(username_lower: ...) so that
User lookup only runs when embed_by_username is present; reference fetch_http,
SiteSetting.embed_by_username, and User.where to locate the change.
| it "requires feed_polling_url" do | ||
| SiteSetting.stubs(:feed_polling_url).returns(nil) | ||
| poller.expects(:poll_feed).never | ||
| poller.execute({}) | ||
| end | ||
|
|
||
| it "requires embed_by_username" do | ||
| SiteSetting.stubs(:embed_by_username).returns(nil) | ||
| poller.expects(:poll_feed).never | ||
| poller.execute({}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the guard‑condition specs deterministic.
These examples can pass without exercising the intended guard if SiteSetting.feed_polling_enabled? is false by default. Stub all required settings so each test isolates exactly one missing requirement.
🛠️ Suggested update
it "requires feed_polling_url" do
+ SiteSetting.stubs(:feed_polling_enabled?).returns(true)
SiteSetting.stubs(:feed_polling_url).returns(nil)
+ SiteSetting.stubs(:embed_by_username).returns(embed_by_username)
poller.expects(:poll_feed).never
poller.execute({})
end
it "requires embed_by_username" do
+ SiteSetting.stubs(:feed_polling_enabled?).returns(true)
+ SiteSetting.stubs(:feed_polling_url).returns(url)
SiteSetting.stubs(:embed_by_username).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it "requires feed_polling_url" do | |
| SiteSetting.stubs(:feed_polling_url).returns(nil) | |
| poller.expects(:poll_feed).never | |
| poller.execute({}) | |
| end | |
| it "requires embed_by_username" do | |
| SiteSetting.stubs(:embed_by_username).returns(nil) | |
| poller.expects(:poll_feed).never | |
| poller.execute({}) | |
| it "requires feed_polling_url" do | |
| SiteSetting.stubs(:feed_polling_enabled?).returns(true) | |
| SiteSetting.stubs(:feed_polling_url).returns(nil) | |
| SiteSetting.stubs(:embed_by_username).returns(embed_by_username) | |
| poller.expects(:poll_feed).never | |
| poller.execute({}) | |
| end | |
| it "requires embed_by_username" do | |
| SiteSetting.stubs(:feed_polling_enabled?).returns(true) | |
| SiteSetting.stubs(:feed_polling_url).returns(url) | |
| SiteSetting.stubs(:embed_by_username).returns(nil) | |
| poller.expects(:poll_feed).never | |
| poller.execute({}) | |
| end |
🤖 Prompt for AI Agents
In `@spec/jobs/poll_feed_spec.rb` around lines 18 - 27, Make the specs
deterministic by stubbing SiteSetting.feed_polling_enabled? to true in each
example and ensuring only the intended setting is nil; for the "requires
feed_polling_url" test stub SiteSetting.feed_polling_enabled? => true and stub
SiteSetting.embed_by_username to a valid non-nil value before calling
poller.execute, and for the "requires embed_by_username" test stub
SiteSetting.feed_polling_enabled? => true and stub SiteSetting.feed_polling_url
to a valid non-nil value so poller.execute only fails due to the specifically
tested missing setting (references: poller.execute,
poller.expects(:poll_feed).never, SiteSetting.stubs(:feed_polling_url),
SiteSetting.stubs(:embed_by_username),
SiteSetting.stubs(:feed_polling_enabled?)).
This pull request was automatically created by
@coderabbitai/e2e-reviewer.Batch created pull request.
Summary by CodeRabbit